AITopics | data annotation

Collaborating Authors

data annotation

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

'In the end, you feel blank': India's female workers watching hours of abusive content to train AI

The GuardianFeb-5-2026, 08:00:24 GMT

A still from Humans in the Loop, a 2024 documentary that follows female data workers in Jharkhand state, India, whose labour underpins global AI systems. A still from Humans in the Loop, a 2024 documentary that follows female data workers in Jharkhand state, India, whose labour underpins global AI systems. 'In the end, you feel blank': India's female workers watching hours of abusive content to train AI Thu 5 Feb 2026 03.00 ESTLast modified on Thu 5 Feb 2026 03.03 EST On the veranda of her family's home, with her laptop balanced on a mud slab built into the wall, Monsumi Murmu works from one of the few places where the mobile signal holds. The familiar sounds of domestic life come from inside the house: clinking utensils, footsteps, voices. On her screen a very different scene plays: a woman is pinned down by a group of men, the camera shakes, there is shouting and the sound of breathing.

artificial intelligence, india, social media, (16 more...)

The Guardian

Country: Asia > India > Jharkhand (0.56)

Industry:

Leisure & Entertainment > Sports (0.70)
Health & Medicine > Therapeutic Area > Psychiatry/Psychology (0.47)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence (1.00)

Add feedback

From LLM-anation to LLM-orchestrator: Coordinating Small Models for Data Labeling

Lu, Yao, Ji, Zhaiyuan, Du, Jiawei, Shanqing, Yu, Xuan, Qi, Zhou, Tianyi

arXiv.org Artificial IntelligenceJun-23-2025

Although the annotation paradigm based on Large Language Models (LLMs) has made significant breakthroughs in recent years, its actual deployment still has two core bottlenecks: first, the cost of calling commercial APIs in large-scale annotation is very expensive; second, in scenarios that require fine-grained semantic understanding, such as sentiment classification and toxicity classification, the annotation accuracy of LLMs is even lower than that of Small Language Models (SLMs) dedicated to this field. To address these problems, we propose a new paradigm of multi-model cooperative annotation and design a fully automatic annotation framework AutoAnnotator based on this. Specifically, AutoAnnotator consists of two layers. The upper-level meta-controller layer uses the generation and reasoning capabilities of LLMs to select SLMs for annotation, automatically generate annotation code and verify difficult samples; the lower-level task-specialist layer consists of multiple SLMs that perform annotation through multi-model voting. In addition, we use the difficult samples obtained by the secondary review of the meta-controller layer as the reinforcement learning set and fine-tune the SLMs in stages through a continual learning strategy, thereby improving the generalization of SLMs. Extensive experiments show that AutoAnnotator outperforms existing open-source/API LLMs in zero-shot, one-shot, CoT, and majority voting settings. Notably, AutoAnnotator reduces the annotation cost by 74.15% compared to directly annotating with GPT-3.5-turbo, while still improving the accuracy by 6.21%. Project page: https://github.com/Zhaiyuan-Ji/AutoAnnotator.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2506.16393

Genre: Research Report (0.64)

Industry:

Information Technology (1.00)
Health & Medicine > Therapeutic Area (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Datasheets Aren't Enough: DataRubrics for Automated Quality Metrics and Accountability

Winata, Genta Indra, Anugraha, David, Liu, Emmy, Aji, Alham Fikri, Hung, Shou-Yi, Parashar, Aditya, Irawan, Patrick Amadeus, Zhang, Ruochen, Yong, Zheng-Xin, Cruz, Jan Christian Blaise, Muennighoff, Niklas, Kim, Seungone, Zhao, Hanyang, Kar, Sudipta, Suryoraharjo, Kezia Erina, Adilazuarda, M. Farid, Lee, En-Shiun Annie, Purwarianti, Ayu, Wijaya, Derry Tanti, Choudhury, Monojit

arXiv.org Artificial IntelligenceJun-4-2025

High-quality datasets are fundamental to training and evaluating machine learning models, yet their creation-especially with accurate human annotations-remains a significant challenge. Many dataset paper submissions lack originality, diversity, or rigorous quality control, and these shortcomings are often overlooked during peer review. Submissions also frequently omit essential details about dataset construction and properties. While existing tools such as datasheets aim to promote transparency, they are largely descriptive and do not provide standardized, measurable methods for evaluating data quality. Similarly, metadata requirements at conferences promote accountability but are inconsistently enforced. To address these limitations, this position paper advocates for the integration of systematic, rubric-based evaluation metrics into the dataset review process-particularly as submission volumes continue to grow. We also explore scalable, cost-effective methods for synthetic data generation, including dedicated tools and LLM-as-a-judge approaches, to support more efficient evaluation. As a call to action, we introduce DataRubrics, a structured framework for assessing the quality of both human- and model-generated datasets. Leveraging recent advances in LLM-based evaluation, DataRubrics offers a reproducible, scalable, and actionable solution for dataset quality assessment, enabling both authors and reviewers to uphold higher standards in data-centric research. We also release code to support reproducibility of LLM-based evaluations at https://github.com/datarubrics/datarubrics.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2506.01789

Country: North America > Canada > Ontario (0.28)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Incentivizing High-Quality Human Annotations with Golden Questions

Liu, Shang, Cai, Zhongze, Wang, Hanzhao, Ma, Zhongyao, Li, Xiaocheng

arXiv.org Machine LearningMay-27-2025

Human-annotated data plays a vital role in training large language models (LLMs), such as supervised fine-tuning and human preference alignment. However, it is not guaranteed that paid human annotators produce high-quality data. In this paper, we study how to incentivize human annotators to do so. We start from a principal-agent model to model the dynamics between the company (the principal) and the annotator (the agent), where the principal can only monitor the annotation quality by examining $n$ samples. We investigate the maximum likelihood estimators (MLE) and the corresponding hypothesis testing to incentivize annotators: the agent is given a bonus if the MLE passes the test. By analyzing the variance of the outcome, we show that the strategic behavior of the agent makes the hypothesis testing very different from traditional ones: Unlike the exponential rate proved by the large deviation theory, the principal-agent model's hypothesis testing rate is of $Θ(1/\sqrt{n \log n})$. Our theory implies two criteria for the \emph{golden questions} to monitor the performance of the annotators: they should be of (1) high certainty and (2) similar format to normal ones. In that light, we select a set of golden questions in human preference data. By doing incentive-compatible experiments, we find out that the annotators' behavior is better revealed by those golden questions, compared to traditional survey techniques such as instructed manipulation checks.

annotator, large language model, machine learning, (20 more...)

arXiv.org Machine Learning

2505.19134

Country:

North America > United States > Virginia (0.04)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > Colorado > Denver County > Denver (0.04)
(5 more...)

Genre: Research Report (1.00)

Industry:

Law (1.00)
Health & Medicine (0.67)
Education (0.67)
Information Technology > Security & Privacy (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.68)

Add feedback

LLMs as Data Annotators: How Close Are We to Human Performance

Haq, Muhammad Uzair Ul, Rigoni, Davide, Sperduti, Alessandro

arXiv.org Artificial IntelligenceApr-22-2025

In NLP, fine-tuning LLMs is effective for various applications but requires high-quality annotated data. However, manual annotation of data is labor-intensive, time-consuming, and costly. Therefore, LLMs are increasingly used to automate the process, often employing in-context learning (ICL) in which some examples related to the task are given in the prompt for better performance. However, manually selecting context examples can lead to inefficiencies and suboptimal model performance. This paper presents comprehensive experiments comparing several LLMs, considering different embedding models, across various datasets for the Named Entity Recognition (NER) task. The evaluation encompasses models with approximately $7$B and $70$B parameters, including both proprietary and non-proprietary models. Furthermore, leveraging the success of Retrieval-Augmented Generation (RAG), it also considers a method that addresses the limitations of ICL by automatically retrieving contextual examples, thereby enhancing performance. The results highlight the importance of selecting the appropriate LLM and embedding model, understanding the trade-offs between LLM sizes and desired performance, and the necessity to direct research efforts towards more challenging datasets.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2504.15022

Country:

Europe (0.67)
North America > United States (0.28)
North America > Mexico (0.28)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (0.93)

Industry: Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.92)

Add feedback

How to Enable Effective Cooperation Between Humans and NLP Models: A Survey of Principles, Formalizations, and Beyond

Huang, Chen, Deng, Yang, Lei, Wenqiang, Lv, Jiancheng, Chua, Tat-Seng, Huang, Jimmy Xiangji

arXiv.org Artificial IntelligenceJan-10-2025

Advancements in NLP research have been greatly Given all these elements, the information propelled by large language models (LLMs), which on particular details about how to formalize an have showcased exceptional abilities (Zhao et al., effective human-model cooperation to achieve 2023; Laskar et al., 2024). These advancements are collective outputs is rather under-specified and paving the way for the development of AI models scattered. Therefore, a comprehensive and systematic that can behave as autonomous agents, working analysis of the underlying principles and alongside humans to tackle intricate tasks. These formalizations of human-model cooperation is still models, for example, can cooperate with humans absent. This gap in understanding presents a significant on data annotation (Klie et al., 2020; Li et al., opportunity for advancement, enabling us 2023a; Huang et al., 2024c), information seeking to develop a deeper understanding of the fundamental (Deng et al., 2023a; Wang et al., 2023b; Zhang basics that govern the effective cooperation et al., 2024d), creative writing (Padmakumar and between humans and intelligent models. He, 2022; Akoury et al., 2020) and real-world problem To fill this gap, in this survey, we take the first solving (Mehta et al., 2023; Feng et al., 2024; step to summarize the principles, formalizations, Qian et al., 2024).

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2501.05714

Country:

North America > United States (0.46)
Asia > Middle East (0.46)
Europe > Italy (0.28)

Genre:

Research Report (1.00)
Overview (1.00)

Industry:

Health & Medicine > Therapeutic Area (0.93)
Information Technology > Security & Privacy (0.92)
Law (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.88)

Add feedback

Hands-On Tutorial: Labeling with LLM and Human-in-the-Loop

Artemova, Ekaterina, Tsvigun, Akim, Schlechtweg, Dominik, Fedorova, Natalia, Tilga, Sergei, Chernyshev, Konstantin, Obmoroshev, Boris

arXiv.org Artificial IntelligenceDec-23-2024

Training and deploying machine learning models relies on a large amount of human-annotated data. As human labeling becomes increasingly expensive and time-consuming, recent research has developed multiple strategies to speed up annotation and reduce costs and human workload: generating synthetic training data, active learning, and hybrid labeling. This tutorial is oriented toward practical applications: we will present the basics of each strategy, highlight their benefits and limitations, and discuss in detail real-life case studies. Additionally, we will walk through best practices for managing human annotators and controlling the quality of the final dataset. The tutorial includes a hands-on workshop, where attendees will be guided in implementing a hybrid annotation setup. This tutorial is designed for NLP practitioners from both research and industry backgrounds who are involved in or interested in optimizing data labeling projects.

large language model, machine learning, natural language, (15 more...)

arXiv.org Artificial Intelligence

2411.04637

Country:

Europe > Netherlands > North Holland > Amsterdam (0.04)
Europe > Germany > Baden-Württemberg > Stuttgart Region > Stuttgart (0.04)
North America > Mexico > Mexico City > Mexico City (0.04)
(6 more...)

Genre: Instructional Material > Course Syllabus & Notes (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

On Limitations of LLM as Annotator for Low Resource Languages

Jadhav, Suramya, Shanbhag, Abhay, Thakurdesai, Amogh, Sinare, Ridhima, Joshi, Raviraj

arXiv.org Artificial IntelligenceNov-26-2024

Low-resource languages face significant challenges due to the lack of sufficient linguistic data, resources, and tools for tasks such as supervised learning, annotation, and classification. This shortage hinders the development of accurate models and datasets, making it difficult to perform critical NLP tasks like sentiment analysis or hate speech detection. To bridge this gap, Large Language Models (LLMs) present an opportunity for potential annotators, capable of generating datasets and resources for these underrepresented languages. In this paper, we focus on Marathi, a low-resource language, and evaluate the performance of both closed-source and open-source LLMs as annotators. We assess models such as GPT-4o and Gemini 1.0 Pro, Gemma 2 (2B and 9B), and Llama 3.1 (8B) on classification tasks including sentiment analysis, news classification, and hate speech detection. Our findings reveal that while LLMs excel in annotation tasks for high-resource languages like English, they still fall short when applied to Marathi. Even advanced closed models like Gemini and GPT underperform in comparison to BERT-based baselines, highlighting the limitations of LLMs as annotators for low-resource languages.

annotation, dataset, llm, (12 more...)

arXiv.org Artificial Intelligence

2411.17637

Country:

Asia > Myanmar > Tanintharyi Region > Dawei (0.04)
Asia > India > Tamil Nadu > Chennai (0.04)

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

LLM Chain Ensembles for Scalable and Accurate Data Annotation

Farr, David, Manzonelli, Nico, Cruickshank, Iain, Starbird, Kate, West, Jevin

arXiv.org Artificial IntelligenceNov-1-2024

Abstract--The ability of large language models (LLMs) to perform zero-shot classification makes them viable solutions for data annotation in rapidly evolving domains where quality labeled data is often scarce and costly to obtain. However, the large-scale deployment of LLMs can be prohibitively expensive. This paper introduces an LLM chain ensemble methodology that aligns multiple LLMs in a sequence, routing data subsets to subsequent models based on classification uncertainty. This approach leverages the strengths of individual LLMs within a broader system, allowing each model to handle data points where it exhibits the highest confidence, while forwarding more complex cases to potentially more robust models. Our results show that the chain ensemble method often exceeds the performance of the best individual model in the chain and achieves substantial cost savings, making LLM chain ensembles a practical and efficient solution for large-scale data annotation challenges.

confidence score, ensemble, llm, (16 more...)

arXiv.org Artificial Intelligence

2410.13006

Country:

North America > United States > Washington > King County > Seattle (0.28)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
North America > United States > Georgia > Richmond County > Augusta (0.04)
(3 more...)

Genre: Research Report > New Finding (0.86)

Industry: Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

Can Vision-Language Models Replace Human Annotators: A Case Study with CelebA Dataset

Lu, Haoming, Zhong, Feifei

arXiv.org Artificial IntelligenceOct-12-2024

This study evaluates the capability of Vision-Language Models (VLMs) in image data annotation by comparing their performance on the CelebA dataset in terms of quality and cost-effectiveness against manual annotation. Annotations from the state-of-the-art LLaVA-NeXT model on 1000 CelebA images are in 79.5% agreement with the original human annotations. Incorporating re-annotations of disagreed cases into a majority vote boosts AI annotation consistency to 89.1% and even higher for more objective labels. Cost assessments demonstrate that AI annotation significantly reduces expenditures compared to traditional manual methods--representing less than 1% of the costs for manual annotation in the CelebA dataset. These findings support the potential of VLMs as a viable, costeffective alternative for specific annotation tasks, reducing both financial burden and ethical concerns associated with large-scale manual data annotation. The AI annotations and re-annotations utilized in this study are available on GitHub.

large language model, machine learning, natural language, (15 more...)

arXiv.org Artificial Intelligence

2410.09416

Genre: Research Report > New Finding (0.35)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.71)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.49)

Add feedback